Technical Report: Ratio Threshold Queries over Distributed Data Sources
نویسندگان
چکیده
Continuous aggregation queries over dynamic data are used for real time decision making and timely business intelligence. In this paper we consider queries where a client wants to be notified if the ratio of two aggregates over distributed data crosses a specified threshold. Consider these scenarios: a mechanism designed to defend against distributed denial of service attacks may be triggered when the fraction of packets arriving to a subnet is more than 5% of the total packets; or a distributed store chain withdraws its discount on luxury goods when sales of luxury goods constitute more than 20% of the overall sales. The challenge in executing such ratio threshold queries (RTQs) lies in incurring the minimal amount of communication necessary for propagation of updates from data sources to the aggregator node where the client query is executed. We address this challenge by proposing schemes for converting the client ratio threshold condition into conditions on individual distributed data sources. Whenever the condition associated with a source is violated, the source pushes its data values to the aggregator, which in turn pulls data values from other sources to determine whether the client threshold condition is indeed violated. We present algorithms to minimize the number of source condition violations (i.e., the number of pushes) while ensuring that no violation of the client threshold condition is missed. Further, in case of a source condition violation, we propose efficient selective pulling algorithms for intelligently choosing additional sources whose data should be pulled by the aggregator. Using performance evaluation on synthetic and real traces of data updates we show that our algorithms result in up to an order of magnitude less number of messages compared to existing approaches in the literature.
منابع مشابه
Distributed Threshold Querying of General Functions by a Difference of Monotonic Representation
The goal of a threshold query is to detect all objects whose scoreexceeds a given threshold. This type of query is used in many set-tings, such as data mining, event triggering, and top-k selection.Often, threshold queries are performed over distributed data. Givendatabase relations that are distributed over many nodes, an object’sscore is computed by aggregating the value o...
متن کاملVerteilung globaler Anfragen auf heterogene Stromverarbeitungssysteme
Deployment of Global Queries in Distributed and Heterogeneous StreamProcessing Systems Distributed in-network stream processing is more efficient than sending all data to a central processing unit. In the past few years Stream-Processing Systems (SPSs) have established themselves as an interesting alternative to database systems for continuous query processing. There are many scenarios having w...
متن کاملSearch for the Best but Expect the Worst - Distributed Top-k Queries over Decreasing Aggregated Scores
We consider distributed top-k queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers. In contrast to existing work, we exclusively consider distributed top-k queries over decreasing aggregated values. State-of-the-art distributed top-k algorithms usually depend on threshold propagation to reduce expen...
متن کاملUsing First-Order Logic to Query Heterogeneous Internet Data Sources
This paper describes an approach to formulate queries in the language of first order logic over data from disparate sources distributed over a network. The data sources are treated as if they were all in a common database. The data sources may incorporate different stored or computed methods of providing data– web services and REST APIs, XML/JSON repositories, web pages, full featured databases...
متن کاملDeXIN: An Extensible Framework for Distributed XQuery over Heterogeneous Data Sources
In the Web environment, rich, diverse sources of heterogeneous and distributed data are ubiquitous. In fact, even the information characterizing a single entity like, for example, the information related to a Web service is normally scattered over various data sources using various languages such as XML, RDF, and OWL. Hence, there is a strong need for Web applications to handle queries over het...
متن کامل